Characteristics and phylogenetic analysis of the complete chloroplast genome of Primulina hedyotidea

Abstract Primulina hedyotidea (Woon Young Chun) Yin Zheng Wang 2011 is an important medicinal plant that has a long history of medicinal use in China. In this experiment, the whole chloroplast genome of P. hedyotidea was determined by next-generation sequencing technology. The total base length of P. hedyotidea was 153,297 bp, the GC content was 37.62%, the inverted repeat (IR) region length was 25,494 bp, the large single copy (LSC) region was 84,158 bp and the small single copy (SSC) region was 18,151 bp. In addition, the genome consisted of 80 protein-coding genes, 4 rRNA genes, and 28 tRNA genes, for a total of 112 genes. A phylogenetic tree was constructed to explore the evolutionary relationship between P. hedyotidea and other species. The findings of phylogenetic tree analysis show that Primulina huaijiensis and P. hedyotidea have a close relationship, and this study can help with species identification and phylogenetic analysis within Primulina and Gesneriaceae species.


Materials and methods
We collected fresh leaves (Figure 1) in September 2021 from the campus of Guangxi University of Chinese Medicine (N22 48 0 14 00 ; E 108 30 0 4 00 ) and extracted the genomic DNA of P. hedyotidea.The collected specimen is stored in the Herbarium of Guangxi University of Chinese Medicine (voucher number 202109HJGC001) and can be accessed through the contact person Haicheng Wen (wenhaicheng2015@qq.com).
(A) Vegetative body.The plant is typically 12-18 cm tall with a long rhizome.The base leaf blade is oblong-lanceolate, 6.5-10 cm long, 0.9-2.4cm wide, and has 3-4 lateral veins on each side.The flowers of P. hedyotidea are dish-cupshaped and approximately 0.5 mm tall.
To obtain the original sequence data, we used the cetyltrimethylammonium bromide (CTAB) method (Roomi et al. 2013) to amplify the genomic DNA sequenced on an Illumina HiSeq4000 platform (Eo 2021).The second-generation sequencing data quality statistical software Cutadapt (version 1.9.1) was used to remove the linker and lowquality sequences from the pass filter data to obtain clean data for subsequent information analysis.Based on the optimized data, clean data were aligned to the reference genomic sequence using BWA (version 0.7.12) alignment software, and the aligned sequences were extracted.The segmented contig sequences were spliced using NOVOPlasty (version 2.7.2) software and further assembled into scaffold sequences with SSPACE (version 3.0) software.Finally, scaffold sequences with a low proportion of unknown N bases and long sequence lengths were obtained through processing with GapFiller (version 1-10) software.Predictive and noncoding 'transfer RNA (tRNA)' or 'ribosomal RNA (rRNA)' analysis of encoded genes was performed using the GeSeq (version 1.78) software and Rfam (version 12.0) database (Fan et al. 2020).CPGView (http:// www.1kmpg.cn/cpgview)online tool was used to draw the P. hedyotidea genome circle and analyze difficult-to-annotate plastome genes (cis-splicing genes and trans-splicing genes) (Liu et al. 2023).The unique genes of P. hedyotidea chloroplast genome were annotated using CPGAVAS2 web service (Shi et al. 2019), then complete chloroplast genome sequence of the P. hedyotidea submitted to GenBank (Accession number: OM283823).To obtain the evolutionary history information of P. hedyotidea, we used MEGA X to construct a phylogenetic tree (Kumar et al. 2018).First, we downloaded the complete gene sequences of 14 species related to P. hedyotidea from GenBank and compared the chloroplast genome sequences with ClustalW sequences (Peng et al. 2022).Second, the maximum likelihood (ML) tree was produced using the Tamura-Nei substitution model (Tamura and Nei 1993) and the bootstrap method (1000 repetitions) for testing the phylogeny (Kumar et al. 2016).

Results
According to the assembly results, the length of the genome sequence was 153,297 bp, with an average depth of 542.90X (Supplementary Figure 1).The GC content was 37.62%, the inverted repeat (IR) region length was 25,494 bp, the large single copy (LSC) region was 84,158 bp and the small single copy (SSC) region was 18,151 bp (Table 1).In addition, the genome consisted of 80 protein-coding genes, 4 rRNA genes (rrn16S, rrn23S, rrn4.5S,rrn5S), and 28 tRNA genes, for a total of 112 genes (Figure 2).80 protein-coding genes may be categorized into the following four groups based on their functions: A.44 photosynthesis-related genes, B.25 self-replication-related genes, C.7 other coding genes, and D.4 Unkown functional genes.
The circular chloroplast genome map displays 80 proteincoding genes, 28 tRNA genes, and 4 rRNA genes.Different categories of genes are labeled with distinct colors.
The medicinal plant P. hedyotidea and Primulina huaijiensis were found to be clustered together in the current experiment's results (Figure 3), and species of the Primulina genus were found to be clustered together in the phylogenetic trees with a bootstrap value of 100, indicating that this genus was a confident monophyletic group.The cp genome information about P. hedyotidea provided by this study can help with species identification and phylogenetic analysis within Primulina and Gesneriaceae species.Due to the limitations of current technology, there are two separate sequence numbers for the same species since the sequence of a species may contain flaws that need ongoing updating and correction.Both the OM283823 and ON456287 complete genome sequences for P. hedyotidea were deposited in GenBank on November 27 and December 14, respectively.We compared OM283823 and ON456287 in terms of nucleotide composition and sequence accuracy.Supplementary  Table 1 illustrates all eight single nucleotide polymorphisms.Regarding oligonucleotide differences between the two sequences, eight locations were identified (Supplementary Table 2).We found that the mapping reading, which is 292952, better supported OM283823 when we mapped the read to both genomes (Supplementary Table 3).In addition, we discovered the rps19 gene (Supplementary Table 4).

Discussion and conclusion
For correction and annotation by multi-sequence alignment, cis-splicing in this study revealed that 14 genes (rps16, atpF, rpoC1, rpl2, ndhB, ndhA, ndhB_copy2, rpl2_copy2, trnG-UCC, trnL-UAA, trnI-GAU, trnA-UGC, trnA-UGC_copy2, and trnI-GAU_ copy2) included one intron, four genes(rps12, ycf3, rps12_ copy2, and clpP) contained two introns, one small-exon gene (rps16), and one trans-splicing gene (rps12), and the results are presented in Supplementary Figures 2 and 3.This study reports the chloroplast genome of P. hedyotidea for the first time and uses phylogenetic trees to analyze the relationships among related species.The phylogenetic tree indicates a close link between Primulina huaijiensis and P. hedyotidea, and this work adds new data to the phylogeny, species identification, conservation of species resources, and genetic investigations of the genus Primulina.

Figure 1 .
Figure 1.Reference image of Primulina hedyotidea taken by Haicheng Wen at the botanical garden of Guangxi University of Chinese Medicine.

Table 1 .
Structure and composition of P. hedyotidea chloroplast genome.